Salesforce Sued Over Alleged Copyright Infringement in AI Training Data
Authors E. Molly Tanzer and Jennifer Gilmore have filed a class action lawsuit against Salesforce in San Francisco federal court, alleging the company illegally used copyrighted books to train its XGen AI models. The complaint claims Salesforce initially cited the "RedPajama-Books" dataset in June 2023 before deleting references and rebranding the data as "publicly available."
CEO Marc Benioff previously acknowledged concerns about AI training data provenance, telling Bloomberg that companies "ripped off" data and "all the training data has been stolen." The lawsuit alleges ongoing copyright infringement through Salesforce's continued use of datasets containing pirated literary works.